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Abstract 

The Metropolis-Adjusted Langevin Algorithm (MALA), originally introduced to 
sample exactly the invariant measure of certain stochastic differential equations 
(SDE) on infinitely long time intervals, can also be used to approximate pathwise 
the solution of these SDEs on finite time intervals. However, when applied to an 
SDE with a nonglobally Lipschitz drift coefficient, the algorithm may not have 
a spectral gap even when the SDE does. This paper reconciles MALA's lack of 
a spectral gap with its ergodicity to the invariant measure of the SDE and finite 
time accuracy. In particular, the paper shows that its convergence to equilibrium 
happens at exponential rate up to terms exponentially small in time-stepsize. This 
quantification relies on MALA's ability to exactly preserve the SDE's invariant 
measure and accurately represent the SDE's transition probability on finite time 
intervals. 

Keywords: Stochastic Differential Equations, Metropolis-Hastings algorithm, Weak Accuracy, Spec- 
tral Gap, Geometric Ergodicity 

Subject classification: 6OJ05 (65C30, 65C05) 

1 Introduction 

The Metropolis-Adjusted Langevin Algorithm (MALA), originally proposed by 
Roberts and Tweedie MRT96bl|RT96all , is a technique to sample exactly complex, 
high-dimensional probability distributions. MALA fits the general framework of 
the Metropolis-Hastings method BMRTT531IHas701 and can be viewed as a special 
case of smart and hybrid Monte-Carlo algorithms MRDF78|[DKPR87ll . The main 
idea of MALA is to obtain the proposal moves from the forward Euler discretiza- 
tion of an SDE whose invariant measure is the target distribution one seeks to 
sample. Besides being ergodic with respect to this invariant measure by construc- 
tion, it was shown recently that MALA also captures the dynamical behavior of the 
solutions to the SDE HBVlOi Therefore MALA has the nice feature that it can be 
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used to estimate finite time dynamical properties along infinitely long trajectories 
of ergodic SDEs. 

Still, one issue with MALA is its theoretical rate of convergence, see for exam- 



ple ||RT96a[ICWG + 08t . When applied to measures with tails that are lighter than 
Gaussian, it is known that MALA does not exhibit a geometric rate of convergence 
to equilibrium even though the exact solution to the SDE does. The main reason is 
that the proposal moves generated by forward Euler are not globally stable. Indeed 
for any time-stepsize one can find an energy value above which the drift in forward 
Euler gives proposed moves that increase the energy, in contrast to the exact drift 
in the SDE which always centers the solution towards lower energy values. Since 
higher energy values have a lower equilibrium probability weight, these proposed 
moves are typically rejected. While these rejections ensure that MALA is ergodic, 
at high energy values they prevent MALA from having a spectral gap. 

The question we investigate in this paper is how severe this problem is in prac- 
tical applications. Above we have argued that the main cause of the lack of geo- 
metric convergence is the behavior of the chain at high energy values. Since the 
chain is unlikely to reach such high energy values over finite time horizons, one 
does not expect their influence to be significant. In practice, it is the behavior of 
MALA on finite but very long times that is of interest, since this behavior is what 
one would experience when running the algorithm on a computer. The goal of this 
article is to quantify the non-asymptotic behavior of MALA. 

The main result of this paper states that the convergence of MALA to its equi- 
librium distribution happens at exponential rate up to terms exponentially small 
in time-stepsize. This can be formulated in the following way, and will later be 
reformulated rigorously as Theorem I37TI 

Claim. Let PP denote the n-step transition probability of MALA and \i its equi- 
librium measure. Set P = Pi^ . Under natural assumptions on the target dis- 
tribution p(dx) = Z^ 1 exp(—U(x)) dx (see Assumption 12.71 ). for h small enough 
and for all x £ R n satisfying U(x) < Eq there exist positive constants Q £ (0, 1), 
C\(Eq) and C2 independent ofh such that the bound 

\\P\x, •) - mUtv < C\{E )(e k + e- c -/ hl/A ) , (1.1) 
holds for all k 6N. 

Observe from (11.11 ) that the distance of MALA to equilibrium is bounded by 
the sum of two terms. The first term converges to exponentially fast and essen- 
tially gives the speed of convergence to equilibrium for the exact solution to the 
underlying SDE. The second term on the other hand remains bounded away from 
as k — > 00. This term arises from the lack of a spectral gap in MALA, but its 
important feature is that it is exponentially small in h. Therefore, its importance 
will be negligible in applications for most practical purposes. 

The crux of the proof is the demonstration that MALA inherits some of the 
convergence properties of the solution to the underlying SDE up to exponentially 
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small terms. This proof relies on finite time accuracy of MALA, ergodicity of 
MALA with respect to the exact equilibrium measure of the SDE, and an applica- 
tion of Harris' theorem. In fact, if MALA did not exactly preserve the equilibrium 
measure of the SDE, the second term in (11.11 ) would not be exponentially small in 
the time-stepsize. For example, if MALA was replaced simply by the uncorrected 
Euler approximations to the SDE, then one would expect the size of the error term 
to be O(h). 

The estimate (11.11 ) does not imply that MALA does not converge to the equilib- 
rium of the SDE. In fact, it is known MRT96al that the TV distance between MALA 
and the equilibrium measure vanishes in the limit as k — > oo. However, this asymp- 
totic property provides no insight on the nonasymptotic behavior of MALA which 
is the main focus of this paper. In fact, even though the upper bound in (11.11 ) does 
not converge to zero in the limit k — > oo, it is the sharpest known bound on finite 
time intervals. 

The power 1/4 in the exponentially small term in (11.11) is due to the second- 
order weak accuracy of the proposal moves generated by the forward Euler scheme, 
and the conditions we impose on the potential energy. In particular, it can be 
traced back to the appearance of the factor U A (x) appearing in the statement of 
Lemma 1531 Under the assumptions made in this paper, this power is sharp. 

At the technical level, the main novelty of the proof of our result is twofold. 
First, we prove finite-time accuracy of MALA in the total variation norm in our 
setting. While accuracy in total variation of the forward Euler algorithm is known 
HBT951 . it is essential for our analysis to cover situations where the drift of the un- 
derlying SDE is not globally Lipschitz continuous. Furthermore, we need to keep 
track of the dependency of the error estimates with respect to the initial condition. 
The main idea for this result is to first obtain an error estimate in some weaker 
Wasserstein distance, and then to strengthen this into a total variation estimate by 
making use of the regularising properties of the one-step transition probabilities of 
the forward Euler algorithm. Second, we show that on a very large set, MALA 
admits a Lyapunov function of the type Q(x) = exp(6U(x)) for suitable 6 > 0. 
Since U is allowed to grow much faster than quadratically at infinity, this Lyapunov 
function fails to be integrable with respect to any Gaussian measure, including of 
course the transition probabilities of forward Euler. While this leads to technical 
complications, having such a fast-growing Lyapunov function is a crucial ingredi- 
ent of our proof, as this is the key to obtaining bounds that are exponentially small 
in h. 

The remainder of this paper is organized as follows. In Section |2 we will 
state the main assumptions required for the proof of our main result. Along the 
way, we recall that MALA is ergodic. In Section [3j the proof of the main result 
is provided. This proof relies crucially on comparison with a 'patched' MALA 
algorithm, where the chain is reflected at the boundaries of a large level set. The 
accuracy of this patched algorithm is investigated in Section 0] Finally, Section [5J 
shows that $ is a Lyapunov function for the MALA algorithm (at least on a large 
domain), which provides the strong a priori bounds required for our analysis. 
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2 A short overview of the MALA algorithm 
2.1 Overdamped Langevin equations 

In this paper we focus on overdamped Langevin dynamics on an energy landscape 
defined by a potential energy function U £ C 4 (M n ,M): 

dY = -VU(Y)dt + ^2fi~ 1 dW, Y(0) = x e R n . (2.1) 

Here VU : W 1 — > W 1 denotes the gradient of the function U, W is a standard 
n-dimensional Wiener process, or Brownian motion, and f3 > is a parameter 
referred to as the inverse temperature. Under certain regularity conditions on the 
potential energy stated in Assumption 12. 1 1 below, the solution to (12.1b is geometri- 
cally ergodic with an invariant probability measure /i that possesses the following 
density ir(x) with respect to Lebesgue measure IIHas80IIRT96al : 

ir(x) = Z~ l exp(-(3U(x)) (2.2) 

where Z = J Rn exp(— /3U (x))dx. 

Before stating assumptions on the potential energy, let us fix some notation. For 
a function G £ C r (R n , K) and an integer r > 1, let VG and D r G be the gradient 
and the rth derivative of G, respectively. Let | • | denote the Euclidean vector norm 
and || • || the Frobenius norm. Let £ denote the generator of (12.11 ) defined for any 
G G C 2 (M n ,]R) as 

CG(x) = -VU(x) • X7G(x) + p^AGfr) . (2.3) 

For any t > 0, let Qt denote the transition probabilities of Y. We will generally 
make an abuse of notation and use the same symbol for a Markov transition kernel 
and the associated Markov operator. That is, for any measurable bounded function 

ip : M. n ■ -> R, we define Q t ip :M. n —?-M. as 

(Qt^fXx) = / Q t (x,dy)ip(y) . 

Throughout this article, we will make the following assumptions on the potential 
energy. Not all of these assumptions will be required for every statement, but we 
find it notationally convenient to have a single set of assumptions to refer to. 
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Assumption 2.1. The potential energy U G C 4 (M. n , W) satisfies the following. 

A ) One has U (x) > 1 and, for any C > there exists an E > such that 

U(x) > C(l + \x\ 2 ), 

for all U(x) > E. 

B) There exist constants c G (0, /3), d > an J 12 > such that 

AU(x) < c\VU(x)\ 2 - dU(x) , (2.4) 
for all xeR" satisfying U (x) > E. 

C) The Hessian ofU is bounded from below in the sense that there exists C > 
such that 

D 2 U(x)(rj,r]) > -C\rj\ 2 , 
uniformly for all x,rj G M n . 

D) There exists a constant C > such that the first four derivatives of the 
potential energy U G C 4 (]R n , M) are bounded by the potential energy itself 
that is 

\\D 4 U(x)\\ V ||£> 3 C/(z;)|| V ||D 2 E/(a;)|| V \VU(x)\ < CU(x) , 

for all x G W 1 . Recall, the function V returns the argument with the maxi- 
mum value. 

Remark 2.2. It follows immediately from Assumption 12. 1 1 (A) above that exists a 
constant E c > such that 

n({U(x) > E}) < e -^r , (2.5) 
for all E > E c . Indeed, it suffices to note that 

fi({U(x) > E}) = \ e~ mx) dx < / e~ mx) ' A dx 

Z JU(x)>E % JU(x)>E 

where the second to last inequality follows from point (A) above, and the last 
inequality holds for E sufficiently large. 

Remark 2.3. The only place where we actually use the fact that U{x) grows like 
| a; | 2 is in the proof of Lemma 1531 below. On the other hand, the statement of that 
approximation result would certainly be true also for potentials that grow slower 
at oo. However, such potentials would not be of interest for the present work. 
Indeed, if the potential grows slower than \x\ 2 and no slower than \x\, then MALA 
can be shown to be exponentially ergodic, so that the results in this article would 
be superfluous. If the potential grows slower than \x\, then MALA will not be 
exponentially ergodic because the true solution of the SDE will not be either. 
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Remark 2.4. Assumption 12. 11 (C) is equivalent to the existence of C > such that 
VU satisfies the one-sided Lipschitz property 

(-VU(x) + VU(y), x-y)<C\x- y\ 2 , V x, y G R n . 

All of these conditions are satisfied, for example, if U is smooth and U(x) « 
\x\ a with a > 2 for large values of x. However, they also allow for potentials that 
have very asymmetric growth at infinity, and they even allow for the potential to 
grow at exponential speed. As a consequence of Assumption 12. 1 K B), one has the 
following drift condition on the transition probability of the solution. 

Lemma 2.5. Let 9 : M + — > K be a C~ function such that there exist u$ > and 
a > such that 9(u) > 0, 0'(it) > 0, uQ'(u) > a@(u), and@"(u) < (J3-c)&{u) 
for u > uq. (Here, the constant c is the one appearing in (12. 4\ above.) Then, there 
exist positive constants Kq and 79 such that 

C(@ o U) < Kq - 7e (6 o U) . 

In particular, 

(Qt(G o U)){x) < e-^ @t O(U(x)) + —(1 - e" 7ei ) (2.6) 

7e 

holds for every t > and for every x G M. n . 

Proof. Using the specific form of C, it follows that for U (x) > no, we have 
£(G o U) = (9' o U)CU + ^(9" o U)\VU\ 2 

P 



;M- ( e'otf))|v^ + M. 



< o U) - 03 - c)(9' o [/)) |VC/| 2 - ^(9' o [/)[/ 

<- T (eoto. 

The result then follows at once from the fact that the condition uQ'{u) > a@(u) 
implies that @(u) — > 00 as u — > 00. □ 

Remark 2.6. The condition of Lemma 1231 holds for example for @(u) = exp(9u), 
provided that 6 < (3 — c. It also holds for Q(u) = u l for every i > and for 
Q(u) = u e exp(#u) with the same constraints on I and 6. This will be useful in 
the sequel. Throughout this article, we will write &(x) = exp(6U(x)) for some 
unspecified 9 < (3 — c, so that 

< K - 7$ . (2.7) 



When the precise value of 9 matters, we will denote the corresponding function by 
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As a consequence of the ellipticity of the SDE (12.11 ). one has the following 
minorization condition on the solution's transition probability. 

Lemma 2.7. For every t > and E > 0, there exists e > such that 

\\Qt(x,-)-Qt(y,-)hv < 2(1 -e), (2.8) 

/or all x,y satisfying U(x) V U(y) < E. 

Remark 2.8. Here and in the sequel, the total variation distance between two prob- 
ability measures is defined as 

||/i — z>||tv = 2 sup |/iL4) — v(A)\ , 

A 

where the supremum runs over all measurable sets. In particular, the total variation 
distance between two probability measures is two if and only if they are mutually 
singular. 

Proof of Lemma \277\ It follows from the ellipticity of the equations that there ex- 
ists a function q(t,x, y) smooth in all of its arguments (for t > 0) such that the 
transition probabilities are given by Qt(x, dy) = q(t, x, y)dy. Furthermore, q is 
strictly positive (see, e.g., Lemma 2.2 of IITal02IO . Hence, by the compactness of 
the set {x : U(x) < E}, one can find a probability measure rj and a constant 
e > such that, 

Q t (X, •) > £7/(0 

for any x satisfying U(x) < E. This condition implies the following transition 
probability Q t is well-defined: 

Qt(x, •) = —^—Qt(x, •) - —— — ??(■) 
1 — e 1 — e 

for any x satisfying &(x) < E. Therefore, 

\\Qt(x, •) - Q t (y, OIItv = (1 - e)\\Q t (x, •) - Q t (y, -)||tv 

for all x, y satisfying U(x) V U(y) < E. Since the TV norm is bounded by 2, one 
obtains the desired result. □ 

Harris' theorem can now be invoked to conclude the transition probability of 
the true solution converges at a geometric rate to its equilibrium measure. For 
the reader's convenience, we state the precise version used in this article. For a 
proof, see the monograph HMT091 . or MHM081 for a shorter and somewhat more 
constructive version. Harris' theorem essentially states that if a Markov chain V 
on an arbitrary (Polish) state space X admits a Lyapunov function such that its 
sublevel sets are 'small', then it is exponentially ergodic. More precisely, Harris' 
theorem applies to any Markov chain that satisfies the following assumptions: 
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Assumption 2.9 (Drift Condition). There exists a function <I> : X — > R + and 

constants 7 G (0, 1) and K > 0, such that the Markov chain V satisfies 

{V$){x) < 7$(ib) + K , (2.9) 

for all x G X. 

Assumption 2.10 (Associated 'Minorization' Condition). There exists a constant 
a G (0, 1) so that the Markov chain V satisfies 

\\V(x r )-V(y,-)\\ TW <2(l-a), (2.10) 

for all x, y G M n with <f>(x) + <£(j/) < 4K/(1 - 7), where K and 7 are the 
constants from Assumption \2.9\ 

Note that in this statement, we have normalised the total variation distance 
between two probability measures in such a way that it is equal to 2 if and only if 
the measures are mutually singular. One then has: 

Theorem 2.11 (Harris' theorem). Suppose a Markov chain V(x, dy) on W 1 satis- 
fies Assumptions\Z9\and\2J0\ Then there exists a unique invariant measure \ifor 
V and there are constants C > and g < 1, both depending only on the constants 
7, K and a appearing in the assumptions, such that 

\\P n {x,-)- n\\ TW < CQ n $(x), 

for any x G M. n . 

With this tool at hand, we obtain the following exponential ergodicity result for 
the solutions to (12.11) : 

Theorem 2.12. Let U be a potential function satisfying Assumption 12. 1 1 Then, for 
every 9 G (0, f3 — c) there exist positive constants 5 G (0, 1) and C such that 

\\Q^(x,-)- fi\\ TV < C5 k exp(6U(x)) (2.11) 

for all t > and all x G W 1 . 

Proof. According to Remark [2761 for every 9 G (0, /3 — c), exp(6U) is a Lyapunov 
function for the Markov chain Q t . Moreover, by Lemma l2?7l it satisfies a minoriza- 
tion condition on every sublevel set of U. Hence, Harris' theorem implies that 
(lTTTT) holds. □ 

Next we recall some integration strategies for (12.11 ) and summarize their proper- 
ties. In particular, we discuss to what extent these strategies preserve the geometric 
rate of convergence of the true solution. 
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2.2 Forward Euler 

Let the time-stepsize h be given, set t k = hk for k E N, and consider the following 
forward Euler discretization of (12.1b: 



X k+1 = X k - hVU(X k ) + ^2^{W{t k+l ) - W{t k )) , X = x G R n . 

(2.12) 

Here X k should be viewed as an approximation to Y k = Y(t k ). The iteration 
rule (12.121) defines a Markov chain that possesses a transition probability with the 
following smooth, strictly positive transition density: 

= WM-^exp {- ]y - X ^7 b UiXf ) ■ (2-13) 

Hence, the chain is irreducible with respect to Lebesgue measure. 

If Vf7 is globally Lipschitz and h is small enough, forward Euler (12.121 ) can 
be shown to be exponentially ergodic with respect to a probability distribution that 
is a first-order approximant to the equilibrium distribution of the SDE (I2.ll ). This 
property is typically established using a Talay-Tubaro expansion of the global weak 
error of forward Euler MTT901 . 

When VU is nonglobally Lipschitz, forward Euler is a transient Markov chain 
for any h > 0. In fact, all moments of forward Euler are unbounded on long time- 
intervals for any initial condition x € IR n . To be precise for any integer £ > 1 and 
for any h > 

E x \X k f->oo as k-^oo, (2.14) 

where K x denotes the expectation conditional on Xq = x, see e.g. HMSH021 
ITal0211 . This instability implies that an equilibrium trajectory of forward Euler 
does not sample any probability distribution. As is well known in the literature, a 
Metropolis-Hastings method can stochastically stabilize forward Euler. 

2.3 MALA Algorithm 

A Metropolis-Hastings method is a Monte-Carlo method for producing samples 
from a known probability distribution HMRTT531IHas70i The method generates a 
Markov chain from a given proposal Markov chain as follows. A proposal move 
is computed according to the proposal chain and accepted with a probability that 
ensures the Metropolized chain is ergodic with respect to the given probability dis- 
tribution. Here we shall focus on the Metropolized forward Euler integrator defined 
in terms of the equilibrium density it (12.21 ) and the transition density (12.131 ). 

Given a time-stepsize h and input state X k the algorithm calculates a proposal 
move using the forward Euler updating scheme in (12.121 ): 



Xl +l = X k - hVU(X k ) + ^2^(W(t k+1 ) - W(t k )) , (2.15) 
and accepts this proposal with a probability 

a h (x,y) = lA— — - . (2.16) 

q h (x,y)Tr(x) 
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In other words, if ( k ~ [7(0, 1) is an i.i.d. sequence of uniformly distributed random 
variables, the update is defined as: 

x M = \ x *» »a<o»cx 4 ,x iH ) (217) 

\X k otherwise 

for k G N. To be consistent with the literature, we will refer to the Metropolized 
forward Euler integrator as the Metropolis-adjusted Langevin algorithm (MALA) 
MRT96bi We emphasize that MALA is a special case of the smart and hybrid 
Monte-Carlo algorithms which are older and more general sampling methods, see 
BRDF781IDKPR87I1 . By construction, MALA preserves the invariant measure fi of 
(1211) . This implies for any g : R n ->• R, 

W{g{X k ))= [ g(x)fi(dx), V G N . (2.18) 

Here E M denotes expectation conditioned on the initial distribution of the integrator 
being the equilibrium distribution of the SDE (12.11) : 

x£R n . 



E^(g(X k ))= I E x (g(X k ))fi(dx), X, 



Moreover, it is quite standard to show that MALA gives rise to an ergodic Markov 
chain. Indeed, denoting by the transition probabilities defined by (12. 17b . one 
has 

Theorem 2.13 (Roberts and Tweedie, MRT96al ). Let U be a potential satisfying As- 
sumption I2.il For any h > the k-step transition probability of MALA converges 
to p, in the total variation metric on probability measures, that is 

lim \\P£(x, •) - fi\\ TW = , 

k— >oo 

for all x G R n . 

If VU is globally Lipschitz and h is small enough, MALA is geometrically 
ergodic (see Theorem 4.1 of |RT%al). However, if VU is nonglobally Lipschitz, 
MALA is not geometrically ergodic even though the solution to the SDE is (see 
Theorem 4.2 of MRT96al ). Specifically, one can prove the following. 

Theorem 2.14 (Roberts and Tweedie, MRT96al ). Let U be a potential satisfying 
Assumption \2.1\ If 

ta W »>M (2.19) 

|a:|-»oo | a; I h 

then MALA operated at time-stepsize h is not geometrically ergodic. 
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If (12.191 ) holds, the tail of the equilibrium density is no heavier than Gaussian. 
For example, if U(x) = x 4 /4 then 

lim int — j — j — = oo . 

\x\— 5>oo \x\ 

In this case the theorem states MALA is not geometrically ergodic, in contrast to 
the true solution of the SDE. The main puipose of this article is to argue that, up 
to errors that are exponentially small in the time-step size h, the convergence of 
the transition probabilities of MALA towards equilibrium still takes place at an 
exponential rate. The next section gives a precise statement of this result, as well 
as an overview of its proof. 

3 Main Results 

We now state and prove the main result of the paper. Throughout this section, 
Ph will denote the one-step transition probabilities of the MALA algorithm as de- 
fined in Section [231 above. We will also use throughout this section the shorthand 
notation P = pP~ f° r tne evolution of MALA over one unit of 'physical time'. 

Theorem 3.1. Let U be a potential function satisfying Assumption \2. 1\ and let P 
be as above. Then, there exists 5 6 (0, 1) and, for every Eq > 0, there exist positive 
constants C\, C% and 1i c (Eq) such that MALA 's distance to stationarity satisfies 

\\P k {x, •) - jullxv < C^ix)^ + e - c ' 2 / hl/4 ) , 
for all k £ N, all stepsizes h < h c , and all x satisfying U(x) < Eq. 

To quantify MALAs distance to stationarity, \\P k (x,-) — /x||tv> we adopt a 
patching argument. The point of the patching argument is to use compactness to 
boost a local property of MALA to a global property. The main ingredient of 
this argument is a version of MALA with reflection on the boundaries of certain 
compact sets. 

To introduce this patched version of MALA, set Rh = {x : U(x) < Eh], 
where Eh = E+hT 1 ^ for a constant E± yet to be determined. The 'patched 
MALA algorithm is then defined as a Metropolized version of forward Euler with 
a reflecting boundary condition at the boundary of R^. This boundary condition 
is enforced by setting the target distribution in MALA to be the equilibrium dis- 
tribution /j, conditional on being in R^. This distribution possesses the following 
density with respect to Lebesgue measure: 

7f(aj) = Z~ l e^ U{x) \ Rh {x) , (3.1) 

where = f Rh exp(—pU(x))dx and lji h is the indicator function for the set 

R h G IP. 
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To be more precise, given a time-stepsize h and input state X k € Rh, the 
algorithm calculates a proposal move using the forward Euler updating scheme 

in dm): 

Xl +1 = X k - hVU(X k ) + V2^(W(t k+l ) - W{t k )) , (3.2) 
and accepts this proposal with a probability 

a h (x,y)={ Qh(x,yM^) n (3.3) 

I otherwise . 

In other words, if ( k ~ U(0, 1) is an i.i.d. sequence of uniformly distributed random 
variables, the update is denned as: 

X k+1 = if Ck < ^ ( * fc ' (3.4) 

I X k otherwise 

for feel. We stress that patched MALA always remains in i?^ since it rejects all 
moves to R^. Let Ph denote the transition probability of patched MALA. Let p, 
denote the invariant measure of Ph with density tt. The invariant measures of Ph 
and Ph are related by: 

for all measureable sets A. Set P = . With this notation we are ready to 

prove Theorem 13. II 

Proof of Main Result. This proof relies on Lemmas 13.21 and [331 provided below. 
Using the triangle inequality, we bound the distance of P k to stationarity by 

\\P k (x, •) - HItv < ||P*(aJ, - P\x, 0||tv + \\P k (x, •) - /2|| TV + \\fi - fi\\rv 



del 



h + h + h- (3.6) 



We now bound all three terms separately. 

Lemma |3~2| bounds I\ in (13.61 ) using a coupling between MALA and patched 
MALA, and the coupling characterization of the total variation distance. The 
lemma states for every Eq > there exist positive constants C\ and h c such that 

h = \\P k (x, •) - P k (x, 0||tv < C^{x)e- pEh k , (3.7) 

for all h < h c and every x satisfying U(x) < Eq. 

Lemma [331 bounds I2 in (13.61 ) by using Harris' theorem, Theorem 12. 1 1 1 This 
lemma relies on a drift and minorization condition for patched MALA. The lemma 
states that patched MALA is exponentially ergodic, that is, for every 5 £ (S, 1) and 
£0 > 0, there exist positive constants C3 and h c such that 

h = \\P k (x, •) - HItv < C 3 $(x)6 k , (3.8) 
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for all h < h c and for all x satisfying U(x) < Eq. 

To bound ^3, we use the characterisation of p, in (13.51 ) and the definition of the 
total variation distance, to get 

||/2 - m||tv = MRl) < 2e~^ 2 , (3.9) 

where we used Remark [2721 to obtain the inequality. 
Combining the bounds (l377i (f3T8T > and (f3T9l) yields 

\\P k (x, •) - /x|| TV < Ci${x)e- pEh k + C^{x)5 k + 2 e - /3 ' B ' l/2 . (3.10) 

Since the total variation distance between a Markov chain and its invariant measure 
is nonincreasing in the TV norm, the linear dependence on k can be eliminated as 
follows. Set k = f/i" 1 / 4 ] in (l3~T0l to obtain: 

C^We-^lh- 1 ^] + C 3 ^(x)e lnC5) ^ lf ^ +2e~^ 2 . 

Since Eh oc /i" 1 / 4 , there exist positive constants C\ and C2 such that 

\\P k {x, •) - /i||Tv < Ci4>(a;)(e- C2 / ,ll/4 + 5 fc ) . 

for all k G N and every 2: satisfying [/(a;) < .Eq. This observation concludes the 
proof. □ 

The next lemma bounds I\ in (13.61 ) using the drift condition obtained in Lemma l331 

Lemma 3.2. Provided that E+ is sufficiently small there exist positive constants 
C\, C2 and h c such that 

sup \\P h [t/hi (x, •) - Pt t/hi (x, OIItv < d<l>(x)e- c ^ hl/ \l + T) 
te[o,T] 

for all x £ Rh, every h < h c , and every T > 0. 

Proof. The measures Ph(x, •) and Ph(x, •) are not the same, even for a point x G 
Rh, since their invariant distributions are different. In particular, patched MALA 
rejects all proposed moves to R^. However, if the input state and proposed move 
are in Rh, the acceptance probabilities of the two chains are the same. Hence, if 
we initiate the two chains in Rh , and drive them by the same realization of noise, 
we obtain a coupling between the two chains such that they are identical up until 
the first time MALA hits R c h . Based on this observation, we obtain a bound on the 
total variation difference between the transition probabilities of the two chains in 
the following way. 

Let {Xk} and {X^} be instances of the Markov chains with respective tran- 
sition probabilities Ph and Ph, driven by the same realization of the noise W, the 
same realisation of the acceptance variables and with identical initial condi- 
tions Xq = Xq = x G Rh- As argued above, we then have = X^ for k < n 
provided that the first time MALA hits i?£ is greater than n. Let denote the first 
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time that Xk hits R c h . The coupling characterization of the total variation distance 
implies that, 

\\P£(x, •) - P£(x, -)||tv < 2F x (X n ± X n ) < 2P x (r h < n) . 

At this stage, one of our main ingredients is the fact that the function Q(x) = 
ewp{0U(x)) is a Lyapunov function for the MALA algorithm, see Proposition 15.21 
below. The probability of MALA first hitting R c h before time n can therefore be 
expressed as 

n n 

^ x (T h <n) = ^^(^(Xfc) > e eEh ) < e~ eEh ^E x $(X fc ) 
fc=i fc=i 

where we made use of Chebychev's inequality. We now note that we can apply 
Proposition 15.21 since < /i" 1 / 2 for h sufficiently small. Since Eh = EJi~ x l' x , 
we can make sufficiently small so that there exists some 7 > such that 

E x $(Xi) < e-*< h §(x) + Kh . 

Combining this with the previous bound, we obtain 

Kh , 



\r h <n)< e~ eEh J2(e^ kh $(x) + - 



-jh ■ 



k=l 



Summing over k and using the fact that E^ oc h 1//4 yields the existence of 
positive constants C\ and C 2 such that 

7? x (T h <n)< C 1 ^(x)e~ C2/hl/ \l + T) , (3.11) 
which is indeed the desired result. □ 

The following lemma proves a geometric rate of convergence for the Markov 
chain P. Recall = {x : U(x) < E^}. The key tool used is Harris' theorem, 
Theorem 12. Ill 

Lemma 3.3. For every 5 G (5, 1), there exist positive constants C and h c such that 

||P fc (a;,-)-/u||TV < C$(x)5 k 

for all x £ Rh and h < h c . In particular, 6 is independent of time-stepsize. 

Proof. To prove this result, we use once again Harris' theorem. The verification 
of its conditions for the Markov chain P is precisely the content of Lemmas 13.41 
and !3.5l below. □ 

In the next lemma, a minorization condition for patched MALA is derived using 
finite time accuracy of patched MALA in the TV norm (see Lemma |4~TT >. 
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Lemma 3.4. Let U be a potential function satisfying Assumption \2.1\ Let e be 

the constant appearing in the minorization condition of the true solution (see 
Lemma [2771) . and let P be as above. For every E > and e E (0, e), there ex- 
ists a positive constant h c such that 

\\P(x,-)-P(y,-)\\ Ty <2(l-e), (3.12) 

for all x,y satisfying U(x) V U (y) < E and h < h c . 

Proof. According to Lemma I2.7[ the bound (13.121) holds when P is replaced by 
Qi, the transition probability for the true solution Y at time one. Combining this 
with Lemma |4~T1 below, we thus obtain 

||P(x,-)-P(y,0||TV<2(l-e) + 2 sup ||P(av)-Qi(av)||TV 

<S>{x)<E 

< 2(1 - e) + C(E)Vh . 

Choosing h sufficiently small so that C{E)\[~h < 2(e — e), the claim follows. □ 

In the next lemma, we derive a drift condition for patched MALA using its 
single-step accuracy in representing the Lyapunov function <£. Deriving this drift 
condition requires a generalization of Theorem 7.2 in MMSH021 to Lyapunov func- 
tions that are neither globally Lipschitz nor essentially quadratic. 

Lemma 3.5. Let U be a potential function satisfying Assumption 12. 1 1 and let 7 be 
the constant appearing in the drift condition (12. 71 ). For every 7 E (0,7/2), there 
exist positive constants E+ and h c such that 

E x [1/h] )) < e-?*(a:) + K , (3.13) 

for all x E Rh and all h < h c . 

Proof. We will actually show that 

E x ($(Xi)) < (1 - jh)®(x) + Kh , 

from which the required bound follows by induction, noting that U(Xk) < Eh for 
every k > by construction. 

We decompose the expression that we want to bound as 

E a; ($(X 1 )) =E X ($(X 1 ), X\ E R h ) +<S>(x)P x (Xl E R c h ) . 

Since 

E X (<^(X 1 ) I X\ E R h ) = E X (^(X 1 ) I X\ E R h ) < E X (^>(X 1 )) , 
it follows that 

E a: ($(Xi)) < E x (^(X 1 ))F x (Xl E R h ) + <S>(x)F x (Xl E R c h ) . 
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Since Eh < h 1 ^ 2 for h sufficiently small, we can apply Proposition I5.2l to the first 
term in this expression, thus obtaining 

E X (^(X 1 )) < (l +P x (Xl G R h )(e^ h - 1 + CE*h))<S>(x) + Kh . 



By making E+ sufficiently small, the requested bound now follows, provided that 
we can find a lower bound on P x (X 1 G Rh) that is arbitrarily close to | for small 
values of h. 

Recall that we have the identity 

f{Xl e R h ) = Whr^ /^exp ( Jl^^t \ dv . 

Using (a + b) 2 < 2a 2 + 2b 2 and Assumption ^. 11 (D). it follows that we can bound 
this by 



>(Xl G R h ) > (47r/3- 1 / i )-"/ 2 exp J r exp 



2-"/ 2 exp + x€i? /l ), (3.14) 



where £ denotes a Gaussian random variable with distribution J\f(0, /3 _1 /i). In 
order to bound this term, denote by n(:c) the unit vector opposite the direction of 
the gradient of U at x, i.e. n(x) = — VU{x)/\VU{x)\. We claim that for every 
5 > 0, there exists C > and E'o > such that for every unit vector m with 
(m, n(a;)) > J, we have U(x + Km) < U(x), provided that n < CU{x)~ l l 2 and 
U(x) > E . 

Indeed, consider the function /(k) = U(x + Km) — U(x). Then / is a smooth 
function such that /(0) = and f'(0) < -5\VU{x)\ < -Ci5y/U(x)fox some d 
by Assumption 12.11 Furthermore, one has /"(k) < C2U(x) for some C2, as long 
as /(k) < 0. Combining these, we see that /'(k) < (and therefore /(k) < 0) for 
every k < 8C\/{C2^U{x)), as claimed. 

For every x G Rh, we now define a set A(x) C S"™" 1 by A(x) = {m : 
x + Km G -R^ Vk < -E^ 1 }- As a consequence of our previous claim, for any 
a < \ there exists h c such that if h < h c , one has mf X £R h \A(x)\/\S n ~ 1 \ > a, 
where | • | denotes the surface measure on the sphere. Denoting by B(x, r) the ball 
of radius r centered at x, we conclude that 

P(£ + x G R h ) > P(£ + a; G R h n -B(a;, E 1 ^ 1 )) > aP(|£| < /i 1 / 4 ^ 1 ) , 

where we used Assumption 12. 1 1 (E) to obtain the last inequality. By making h suf- 
ficiently small, this expression can be made arbitrarily close to a, and the prefactor 
in (13.141 ) can be made arbitrarily close to 1, thus yielding the required bound. □ 
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4 Accuracy of the Patched MALA Algorithm 

When all of the derivatives of U are bounded, accuracy in the total variation dis- 
tance for forward Euler has been derived using a Talay-Tubaro expansion and 
Malliavin integration by parts MBT951 : see also HTT90H . In this section we treat 
the situation where the derivatives of U are unbounded. The order of accuracy ob- 
tained below is not sharp, but the proof is constructive and is sufficient for MALA 
to inherit a minorization condition from the true solution. To sharpen the estimate, 
retrace the steps of the proof in HBT951 and replace boundedness of the coefficients 
by some coercivity. 

Lemma 4.1. Let U be a potential satisfying Assumption \2.1\ Let and Qy t de- 
note the transition probability of patched MALA and the true solution, respectively. 
Then, for every T > 0, there exists C(T) > such that for all h < 1, the bound 

\\Pt t/hl (x, •) - QfJ h \x, 0||tv < C{T)VhU\x) , (4.1) 
is valid for all x G R n and all t £ [0, T\. 

Proof. This estimate is a consequence of Lemmas l4.2l and l4.6l below. Let i\ denote 
the transition probability of forward Euler (12.121) . The triangle inequality implies 
that, 

||Pi <//lJ (a:,.)-Qr J (a:,0||TV< 

|| j^G*. •) - Pt t/hi (x, OIItv + ll^ AJ 0*, - Ql n \x, OIItv • 

According to Lemma l4~6l the first term is bounded by C(T)VhU 3 (x). According 
to Lemma l4~2l the second term is bounded by C{T)VhU 2 (x). Hence, the desired 
error estimate is obtained. □ 

Lemma 4.2. Let U be a potential satisfying Assumption \2.1\ Let P^ and Qh denote 
the transition probability of forward Euler and the true solution, respectively. Then, 
for every T > 0, there exists C(T) > such that for all h < 1 

WP^ix, •) - Q% /h hx, OIItv < C(T)VhU 2 (x) , 
for all x e R n and all t G [0, T\. 

Proof. We bound the TV distance between forward Euler and the true solution 
using Lemmas 14.31 14.41 and 14.51 as follows. Using the triangle inequality, we split 
the quantity that we wish to bound as 

\\Pl t/hi (x, •) - Q [ * /hi (x, OUtv < \\Pl t/hl (x, ■) - (P h o Q^Xx, -)||tv 

Lt/AJ-i.. s n \t/h\. 



dcf 



+ \\(P h oQ%"^)(x r )-Q%" l \x, 0||tv 
h + h- (4.2) 
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We can rewrite the first term of (14.21) as 



h = E aj ||P ft (X Lt/ftJ _ 1 , •) - Ph(^ L t/hj-i, OIItv , 
which, using Lemma l4~4l is bounded by 



h < 



1 




E^dvc/cx^j.!) - vu(y vm _ x )\ a i) . 



Strong accuracy of forward Euler in a bounded metric (see Lemma |4~3T ) then yields 

h < CVhU 2 (x) . 
The second term of (14.21 ) is bounded by 



From Lemma 14.51 and (I2.7I ). it follows that I2 is bounded by ChU 2 {x), and the 



Even though forward Euler is numerically unstable for drifts that are not glob- 
ally Lipschitz, one can prove the following 'strong accuracy' for forward Euler in 
a bounded metric. As the proof shows, boundedness of the metric plays the role of 
stability of the numerical scheme. 

Lemma 4.3. Let U be a potential satisfying Assumption \2.1\ Let X and Y denote 
forward Euler and the true solution, respectively. Then, for every T > there 
exists C(T) > such that 



holds for all x G R n , all h < 1, and all t £ [0, T]. 

Proof. The proof goes by induction over the number of steps, so let us consider 
one single step first. We then have 



h < E x (\\P h (Y lt/hi ^, •) - Q h (Y lt/hi _ lt -)||tv) ■ 



claim follows. 



□ 



E^flX^j - Y{[t/h\h)\ A 1) < C{T)hU\x) , 




so that 



Xt-Yn 



Xq — Y 



2h(X - Y , W(X ) - W(Yo)) 




Accuracy of the Patched MALA Algorithm 



19 



Together with Remark [Z4l this implies that there exists a constant C such that 

\Xx - Y h \ 2 < (1 + Ch)\X - Y \ 2 + 2h 2 \VU(X ) - VU(Y )\ 2 

+ 2 I (X - Y ,X7U(Y S ) - VU(Y )) ds 
Jo 

rh 

+ 2h / \VU(Y S ) - VU(Y )\ 2 ds . (4.3) 
Jo 

Note now that if r\ is any unit vector in W 1 , we have the identity 

(VU(Y S ) - VU(Y ), n) = [ S C(VU, V )(Y r ) dr + */| (" D 2 U(Y r )( v , dW r ) . 

Jo V P Jo 

Since \\D 2 U\\ < CU and |£(W,??)| < CU 2 , it then follows from Remark^!] 
that there exists a constant C such that 

E\VU(Y S ) - VU(Y )\ 2 < CsU\Y ) , Vs < 1 , 
\E(rj, VU(Y S ) - VU(Y )} | < CsU 2 (Y ) , Vs < 1 . 

On the other hand, one also has the bound 

|W(X ) - VU(Y )\ 2 < C\X - Y \ 2 exp(C\X - Y \)U 2 (Y ) , 

which follows from Assumption 12. 1 1 (D) and Lemma |5?T1 below. In the case where 
\Xo — Vol < 1, this yields 

\VU(X ) - VU(Y )\ 2 < hT^Xo - Y \ 2 + hCU\Y ) . 

Inserting these bounds into (14.3b . we see that there is C > such that if \Xq — 
Y \ < 1, then 

E|Xi - Y h \ 2 < (1 + Ch)\X - Y \ 2 + Ch 3 U\Y ) . 

Since on the other hand, one obviously has E(|Xi - Y h \ 2 A 1) < 1, we conclude 
that 

E(|Xi - Y h \ 2 A 1) < (1 + Ch)(\X - Y \ 2 A 1) + Ch 3 U\Y ) . 

The requested bound now follows from the a priori bounds on the solution Y t 
given by Remark 1231 □ 

Lemma 4.4. Let U be a potential satisfying Assumption I2.il Let Ph denote the 
transition probability of forward Euler. For every h < 1 and for all x,y 6 W 1 , 

\\P h (x, •) - P h (y, 0||tv < r^rTT \ x ~ v\ + ~l=T=t\ VU ( x) ~ VU ^ ■ 
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Proof. Recalling Pinsker's inequality: 

||AA(0,(t)-AAOz,(7)||tv < ^ , 

we see that the claim follows from the fact that 

P h (x, •) = N{x - hVU(x), 2/T 1 /iI) , 

where I denotes the identity matrix. □ 

Lemma 4.5. Let U be a potential satisfying Assumption \2.1\ Let P^ and Qh denote 
the transition probability of forward Euler and the true solution, respectively. Then, 
there exists C > such that, for every h < 1, the bound 

\\P h (x,-)-Q h (x,-)\\Tv < ChU 2 {x), 

holds for all x £ W l . 

Proof. We write Eq = U(x) as a shorthand. The bound is trivial if E^h > 1, 
so we can and will assume in the sequel that EqIx < 1. Recall that the transition 
probabilities Qh are generated by the solutions at time h to 

dY = -VU(Y) dt + ^j2p- l dW , Y(0) = x , (4.4) 

whereas the transition probabilities P^ of forward Euler can be interpreted as the 
solution at time h to 

dX = -X7U(x) dt + ^2/3~ 1 dW , X(0) = x . (4.5) 

Therefore, the required quantity can be bounded from above by the total variation 
distance between the measures generated by (14.41 ) and (14.51 ) on pathspace between 
times and h. Since only the drift differs in the SDEs (14.41 ) and (14.5b . Girsanov's 
theorem can be used to quantify the distance between the laws of the solutions at 
time h to (l44l ) and (1431) . 

We first replace the potential U by a modified potential U which is bounded, 
together with all of its derivatives. Indeed, let 99 : M + — > R be a smooth increasing 
function such that ip(x) = x for x < 2 and ip(x) = 3 for x > 4. With this definition 
at hand, we set 

U(y) = U{x)v{U{y)/U{x)) . 
It then follows from Assumption 12. 1 1 (D) that there exists a constant C such that 

\U(y)\ + \\DU(y)\\ + \\D 2 U(y)\\ < CE , (4.6) 

uniformly over all y £ W 1 . 

Before we proceed, we argue that if we define 



dY = -VU{Y) dt + y / 2fF T dW , F(0) = x , (4.7) 
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then, one has F(3t < h : Y(t) / Y(t)) < CE 2 h, so that we can replace U 
by U in (14.41 ) without any loss of generality In order to show this, we note that 
Lemma 12.51 yields the existence of a constant K such that M(t) = U (Y(t)) — 
Kt — U (x) is a supermartingale with quadratic variation process 

(M, M)(t) = 2/T 1 f \VU(Y(s))\ 2 ds . 
Jo 

Furthermore, for Eq sufficiently large (independently of h), one has 
P(3t < h : Y(t) / Y(t)) < P(supM 4 > \U{x)) . 

t<h 

It then follows from the exponential martingale inequality MRY99I p. 153] that, for 
every A > 0, one has the bound 

P(3t < h : Y(t) / Y(t)) < exp(-U 2 (x)/(8A)) +F({M,M)(h) > A) . 

For 5 > sufficiently small, the second term in this expression can then be 
bounded by 

F({M,M)(h) > A) < exp(-V5A/i- 1 )Eexp ^^{M, M)(h) 

1 l' h 

< expi-VdAh- 1 )- / Eexp s/W^H] VU(Y(s))\ 2 ds 

h Jo 

< exp(— 

h Jo 

< Cexp(-V5Ah~ 1 )exp(cV6U(x)) . 

Here, we have first used Chebychev's inequality, followed by Jensen's inequality, 
then Assumption 12.11 (D), and finally Lemma 12.51 with 5 small enough. Setting 
A = U 2 (x)h~ 1 ' 3 , it follows that for h small enough we actually have P(3i < 
h : Y(t) / Y(t)) < 2exp(— c/i" 1 / 3 ) for some positive constant c, which is much 
better than needed. 

We now proceed by comparing the true solution and forward Euler for U. De- 
note now by the measure on pathspace generated by (I4.7I ). by Vh the mea- 
sure on pathspace generated by solutions to (14.51 ). and by Wh Wiener measure on 
C([0, h], R ) with stalling point x. It then follows from Girsanov's theorem that 

Z Q 1 txp( K --jL={U{W h ) - U(x)) - p G(W t )dt) , 

Zp 1 exp(^-- / ==VU(x) T (W h -x)- h(3\VU(x)\ 2 ^j , 

for some normalisation factors Zp and Zq, where the function G is given by 



dQ h 
dW h 
dV h 
dW h 



(W) 



(W) 



G(x) = |V?7(x)| 2 - AU(x) 
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(See for example MElw821 Theorem 1 1A].) In particular, we have 

^W) = Z^tx V [--j==(lJ{W h ) - U(x) - X7U(xf(W h - x)) 

- J p(G(Wt) ~ \VU(x)fj dt) = Z~ l txp{V h {W)) , 
where the normalisation constant Zh is given by 



Z h = J ex V (D h )dV h . 

By (14.6b . there exists a constant C > such that the bound 

\T> h (W)\ < CE (\W h - x\ 2 + E h) , (4.8) 

holds for every W. As an immediate consequence, for every c > 0, there exists a 
constant C > such that 



og J exp(cD h ) 



< CElh 



for every h < 1. In particular, one has Z h = 1 + 0{Elh) and similarly for Z^ 1 . 
Denote now by By t the set 

B h = {W : |P^(W)| > 1} . 

It follows from the bound (l4~8T) that Vh(B h ) < C exp(-c/(hE )) for some c, C > 
and for hE% < 1. 
We conclude that 



II Qh - V h \\Tv = f |1 - Zl 1 exp(V h )\ dV h < C f \V h \ dV h + O{E 2 h) 

J JB c h 

+ [ {l-Z^expiV^dVh 

JB h 

< 0{Elh) + ( J exp(2P ft ) dV h - l) = O(£; 2 /i) , 

as required. In the last step, we have used the Cauchy-Schwarz inequality. □ 
Lemma 4.6. For every T > 0, there exists a C(T) > such that 

sup ||Pi* /fcJ (aj, •) - Pl t/hi (x, -)||tv < C(T)VhU\x) 

t£[0,T] 

holds for every h < 1 and for every x E R n . 
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Proof. Denote by X k the solution to the forward Euler algorithm after k steps and 
by X k the solution to the MALA algorithm. Since both agree until the first time 
that one step is rejected, it follows from the coupling inequality that we have the 
bound 

n-l 

\\P£(x,-)-PR(x,-)\\tv < 2^E 3; |l-a h (X fc ,X fc+1 )| . 

At this stage, we note that since G [0, 1], it follows from Lemma [531 that for 
every a > there exists a C > such that the bound 

E*] 1 - a h (x, Xi)\ < Ch 3 / 2 (U(x) A a/i" 1 / 2 ) 3 , 

holds for all x G IR n . This is simply because this bound is trivial for U(x) > 

i/Vh. 

Making a sufficiently small and combining this with Corollary 15.71 we then 
obtain 

E*\l - a h (X k , X k+1 )\ < Ch 3 / 2 E x (U(X k ) A ah" 1 / 2 ) 3 

< Ch 3 / 2 (U 3 (x) + Khk) , 

for some constant K > 0. The claim now follows at once by summing over k. □ 
5 Local Drift Conditions 

This section shows that the single-step accuracy of MALA and forward Euler im- 
ply that these algorithms preserve Lyapunov functions of the true solution locally. 
We refer to this property of a numerical method as a local drift condition. In the 
lemmas that follow local drift conditions are derived for the MALA and forward 
Euler algorithms. Deriving such drift conditions requires adapting Theorem 7.2 
of BMSH02H to Lyapunov functions that are neither globally Lipschitz nor essen- 
tially quadratic. Still, the proofs in this section are strongly inspired by the results 
in HMSH02I1 . 

A key technical issue addressed below is that the natural Lyapunov function of 
the true solution, namely Q(x) = exp(6U(x)) grows so fast that it is not in general 
integrable with respect to a Gaussian measure. In particular, it is not integrable 
with respect to the transition probabilities of forward Euler. Nevertheless, we will 
show that the expectation of $ under one step of MALA is finite and close to the 
expectation of $ under the true solution. Integrability of & with respect to the tran- 
sition probability of MALA is a consequence of MALA preserving an equilibrium 
measure whose tails are lighter than Gaussian. 

A first remark which will be useful in this section is that under our assumptions 
on the potential U, it does not behave 'worse than exponential' in the following 
sense: 

Lemma 5.1. There exists C > such that for every x, y G M. n , we have 



\U(x)\ < |?7(3/)|exp(C|a: 
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Proof. It suffices to differentiate the function t \-t E7((l — t)x + ty), invoke As- 
sumption l2. 11 (D). and apply Gronwall's inequality over the interval t G [0, 1]. □ 

Proposition 5.2. Set &(x) = exp(6U(x)). Let X\ denote MALA after one step. 
Then there exist positive constants C and 6 G (0,(3) such that the bound 

E^^Xi)) < (e^ h + CU\x)h 2 )$(x) + — (1 - e~^ h ) 

7 

holds for all x G R n satisfying U(x) < /i -1 / 2 . 

Proof. Denoting by Y(h) the true solution after time h, we write 

E x ($(Xi)) = E x ($(Y(h))) + E a; ($(Xi) - $(Y(h))) . 
We know from (12.71 ) that $ is a Lyapunov function for the true solution, and hence, 

E a: ($(Xi)) < e~ lh ^(x) + — (1 - e~ 7h ) + \E X {^(X{) - $(Y(h)))\ . 

7 

The approximation result between MALA and the true solution given in Lemma l531 
below then implies the desired result. □ 

The following lemma states that the single step error of MALA in preserving 
<I> is 0(h?) with an error constant that depends on &(y) and U A (y) evaluated at the 
initial condition. 

Lemma 5.3. Set &(x) = exp(9U(x)). Let X\ and Y(h) denote MALA and the 
true solution after one step, respectively. Then there exist positive constants C and 
6 G (0, (3) such that the bound 

\E x (^(Xx) - $(Y(h)))\ < CU\x)$(x)h 2 (5.1) 

holds for all xel" satisfying U(x) < /i" 1 / 2 . 

Remark 5.4. Note in particular that the bound (15.11 ) implies that E x $(Xi) < oo. 
This is not obvious a priori since &(x) grows faster than exp \ x\ 2 at infinity. As a 
consequence, this expectation is infinite under the proposal moves. 

Proof. Applying Ito's formula twice to the exact solution yields 

E x ($(Y(h))) = $(a:) + h(£$)(x) + h 2 [ (1 - t)E x (£ 2 <5>(Y(ht)))dt , (5.2) 

J o 

where C denotes the generator as in (I2.3I ). 
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Setting X(s) = x + s(X* — x), we obtain by a simple application of Taylor's 
formula the following identity for the application of one step of MALA: 



(Here we interpret D 4 <I>(y)(a;) 4 as being the quadrilinear form D 4 Q(y) applied to 
(x,x,x, x).) Subtracting (15.21 ) from (15.31 ) and using the definition of the forward 
Euler proposal move to collect this difference in powers of h, we obtain: 



E a: ($(X 1 ) - HY(h))) = h 1 / 2 !^ + hh + h 3 / 2 I 3/2 + h 2 I 2 + R2 ■ (5.4) 



h/2 =VW l ^ x (uh(x,XX)D<S>{x)$) 
h=- 9<S>(x)\VU(x)\ 2 E x (a h (x,Xi) - 1) 

+ p' 1 ®.* {D 2 $(x)(t £)(a h {x, X*) - 1)) 
h/2 = ~ V^^iD^ixXtVUix^ahix^D) 

+ i ((2/3- l f/ 2 E x {D 3 $(xM, t Oa h (x, X}))) 

I 2 =^(E x (D 2 <S>(x)(VU(x),VU(x))a h (x,Xi))) 

_ I (e* (d 3 <S>(x)(VU(x), (-VhVU(x) + ^/W'if^hix, X\)) ) 

- i (yW 1 ®* {D^(xM, VU(x\ -VhVU(x) + i/2p^Oa h (x, X£>) ) 

- i (2^- 1 E a! (Z> 3 *(aO(£, e, VU(x))a h (x, X$) ) 



^ Jo 

We now bound each of these terms separately. The estimates that follow will often 
rely on the hypothesis that U(x) < 1/ \fh together with Assumption ^. 11 (D) which 
implies that the £th derivative of $ satisfies: 




(5.3) 



Here we have introduced: 




D £ $(x)\\ < CU\x)$(x) , 



(5.5) 
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foi£= 1,2,3,4. 

Since the term Ii/ 2 in (15.41) involves an odd function of £, one can rewrite it as 
- E x (a h (x,Xi)D<f>(x)£) = E x ((a h (x,X* 1 ) - l)D$(x)S) . 



VW 1 

Using (15.51) . we infer that 

|/ 1/2 | < CU{xmxW(\l -a h {x,X\)\ 2 ) 1 ' 2 < CU 3 (x)$(x)h 3 / 2 , 

where we used Lemma 1531 in the last inequality. One can similarly bound I 2 / 2 
since it also involves an odd function of £. The term I\ is of the form where 
Lemma l531 can be directly applied after using the Cauchy-Schwarz inequality and 
Assumption 12.11 (D). The terms in I 2 without integrals are bounded in a similar 
fashion, but without the need to invoke Lemma [531 
Note now that 

\C 2 <S>{y)\ < CU\ymy) , 

which is a Lyapunov function for the true solution by Remark 12.61 so that the 
integrand appearing in I 2 is bounded by: 

\E x (£ 2 $(Y(r)))\ < CE X (U\Y(r))<Z>(Y(r))) < CU\x)<S>{x) . (5.6) 

Finally, we describe how to bound R 2 in (I5.4I ). It follows from (15.51 ) that 

1 



R2 4! 



f (l-t) 4 E x (D^(X(t))(Xi - xfa h (x,X\)) 
Jo 

< CE x (U\X(s))<l>(X(s))\Xt - x\ 4 a h (x,X*j) , 

so that our claim follows if we can show that the bound 

EU*(X(s))<Z>(X(s))\X*(0 - x\ A a h {x, X*(gj) < CU A {x)^{x)h 2 , (5.7) 

holds uniformly for s G [0, 1], where £ is a normally distributed random variable. 
Here, we have introduced the shorthand notation 



X*(£) = x- hVU(x) + ^/2hl3- l i . 

Note that for all x satisfying U (x) < 1/ \fh, we have the bound 

\X*(0 -x\< cVh(l + . (5.8) 

Hence, to prove (15.71 ) it suffices to show that 

E((l + \£\ 4 )U\X(s))$(X(s))a h (x,X*(0)) < CU\x)<t>(x) . 

We can then use the Cauchy-Schwarz inequality to get rid of the factor (1 + |£| 4 ), 
so that it suffices to show that 

E(F e (U(X(s)))a h (x, X*(®)) < CF e (U(x)) , (5.9) 
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where we defined the shorthand notation 

F e {u) = u s e 26u . 

Our next step is to turn the occurrences of X(s) in this expression into X*(£). In 
order to do this, we use the fact that Assumption 12. 1 1 (C) implies that U is 'almost' 
convex. Indeed, choose any x, y G M. n and set x s = (1 — s)x + sy, so that one has 
the identity 

U(x s ) = (1 - s)U(x) + sU(y) + s(l - s) [ (VU(x st ) - VU(x st+1 _ t ), y - x) dt . 

Jo 

Since x st +i~t — %st = (1 — t)(y — x), it then follows from Assumption ^. 1 1 (C) that 

U(x s ) < (1 - s)U{x) + sU(y) + Cs(l - s)\x - y\ 2 f (l-t)dt 

Jo 

< (1 - s)U(x) + sU(y) + C\x - y\ 2 , (5.10) 

for some constant C independent of s G [0, 1]. Note also that there exists a constant 
C such that the bound 

F e (u + v)< CF e (u) exp(Cv) , (5.11) 

holds uniformly for all u, v such that u > 1 and v > 0. 

Since furthermore Fq is convex, we deduce from (15.111 ) and (15.101 ) that 

F e (U(X(s))) < Ctx V (C\X\0 - x\ 2 )((l - s)F e (U(x)) + sF 6 (U(X*(®))) ■ 

To bound the first term in this expression, note that it follows from (15.81 ) that 

Eexp (C\X*(0 - x\ 2 ) < Eexp (Ch(l + |^| 2 )) < C , 

so that it is bounded by some multiple of Fq(U(x)). 

Combining this bound with (15.91 ) and the Cauchy-Schwarz inequality, we con- 
clude that it remains to show that 

E(F 2 (U(X\0))a h (x,X\0)) < CF 2 (U(x)) . 

Since q/j is bounded, we can reduce ourselves to showing that 

E(F 2 (U(XHO))a h (x, X\0) , U(X*(0) > U(x)) < CF 2 (U(x)) . (5.12) 

Note now that one has from the definition (12.161 ) of the bound 

a h (x,y) < ^expl^W - PU(V)) , 
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where q^ denotes the one-step transition probabilities for forward Euler. The left- 
hand side of (15.121) can therefore be bounded by 

[ Fi(U(y))q h (y, x) e xp(/3U(x) ~ PU{y)) dy . 

JU(y)>U(x) 

We break this integral into two regions by setting 

Tlx = {y ■ U(x) < U(y) < ah" 1 ' 2 } , TZ 2 = {y : U(y) > ah" 1 ' 2 } , 

for some a > to be determined. 

Observe now that for y 6 7Z\, one has the bound 

q h (y,x) = (47r/3- 1 / l )-™/ 2 exp (~\x - y + hVU(y)\ 2 ^j 

H \- n / 2 «n ( - Au _ „|2 J_ Ph\K7TT(<,A\ 2 



< (4vr/3- 1 / l )- ri ^ exp I -^-\x - y\ z + ^\VU(y)\ 

<Ch~ n l 2 zx V (-^\x-y\ 2 ^ , (5.13) 

where C depends on the choice of a, but not on h. Furthermore, we have the bound 

F 2 (U(y)) exp(J3U{x) - f3U(y)) dy (5. 14) 

< F$(U(x))ex V (C\x -y\ + <J3- 29){U(x) - U(y))) , 

where we have used Lemma [57X1 in order to obtain the last inequality. Combining 
(15.141 ) and (15.131 ) and using the fact that U(y) > U (x) on TZ±, we obtain indeed the 
bound 

f F 2 (U(y))q h (y, x) exp(pU(x) - 0U(v)) dy < CF 2 (U(x)) . 

Finally, in order to bound the integral over 1Z 2 , we make use of the fact that 
qh(y, x) < ChT n l 2 , so that , combining this with (15.141) . we have the bound 

Fi(U(y))q h (y, x)exp(f3U(x) - pU(y)) dy 

n 2 

< Ch~ n l 2 F 2 {x) [ cxp(C\x - y\ + 09 - 29)(U(x) - U(y) j) dy 

Jn 2 

< C7h-"/ 2 F|(a;)exp(/3/i- 1 / 2 ) / exp(-6U(y)) dy , 

for some fixed constant 5 > 0. Here, we have made use of the fact that U(x) < 
hr x l 2 by assumption, and that U grows faster than quadratically by Assump- 
tion [27TJ (A). It follows from (12.51 ) and the definition of IZ2 that 

jf txv(-5U{y))dy<txv(-^hT l l 2 ) , 

so that the requested bound follows, provided that we choose a sufficiently large 
so that a > 25~ l . □ 
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The following lemma is useful to bound the average rejection probability of 
MALA. 

Lemma 5.5 (see also BBV10I0 . For every p E N, there exists an h c > and a 
constant C > 0, such that for any h < h c the bound 

E x (\l- a h (x,Xf)\ p> ) < CU 2p (x)h 3p/2 

holds for all a;eR" satisfying U(x) < hT 1 ! 2 . 

Proof. Introduce the function G : W 1 x W 1 — > M given by 

G(x, y) = U(y) - U(x) - ^{VU(y) + VU(x), y - x) 
+j(\VU(y)\ 2 - \VU(x)\ 2 ^j , 

and the set 

R(x) = {y e M. n | G(x,y) > 1} . 
By (12.161 ) it follows that, for £ a normally distributed random variable, one has 

E x |l - a h (x, X\)\ p = E| 1 - (1 A cxp(-(3G(x, X*(Q)) \ p , 

where we have used the shorthand notation 



X*(£) = x- hVU(x) + v / 2h/3- 1 £ . 
Since |1 — (1 A e~ x )\ < \x\ for every x G R, it follows that 

E*|l - a h {x,Xt)\ p < E\pG(x,X*(Q)\ p 
Introduce the interpolant 



X(t) = x- t(hVU(x) - v / 2/i/3" 1 , 

so that X(0) = x and X(l) = X*(£). An straightforward but tedious calculation 
yields the identity 

h r 1 

G(x, X\0) = 77 / D 2 U(X(t))(VU(X(t)), X\0 - x)dt 
1 Jo 



+ - I t(t-l) D 3 U(X(t))(X*(0 - xf dt . 



(Here we interpret D 3 U(x)y 3 as being the trilinear form D 3 U(x) applied to the 
triple (y,y,y).) Note now that for all x satisfying U(x) < \j\f~h, we have the 
bound 

\X\0 -x\< cVh(l + HI) . 
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On the other hand, we know from Assumption O (D) that \\D {k) U(x)\\ < CU(x) 
for = 1,2,3 and it follows from Lemma I5TT1 that 

U{X{r)) < exp (cVh(l + \£\))U(x) . 

for all r G [0, 1]. Combining these bounds, we obtain 

\G(x,X*(0)\ <Chlu 2 (x)(l + \£\fexp(c^h(l + \£\)} , 

for some constant C > 0. Since the expression involving £ has moments of all 
orders that are independent of h, the result follows. □ 

In the following lemmas we prove a local drift condition for forward Euler. As 
mentioned the 'strong' Lyapunov function &(y) is not integrable with respect to 
the transition probability of forward Euler since its tails are lighter than Gaussian. 
But U (y) is integrable as a consequence of Assumption 12.11 (D) which ensures 
that it grows at most exponentially fast. We will show that single-step accuracy of 
forward Euler implies that it locally inherits this weaker Lyapunov function. 

Lemma 5.6. Let X\ denote forward Euler after one step. Then there exists a 
constant Ci > such that for every E > and fsN the bound 

E^l^Xi)) < (e~^ h + C e U(x) 2 h 2 )U e (x) + —(1 - e-^ h ) 

It 

holds for all a;el" satisfying U(x) < hr x l 2 . 
Proof. Since 

U\x) < e eclxl U e (0) 

by Lemma 15.11 U (x) is integrable with respect to Gaussian measures for every 
I € N. Thus, (PhU e )(x) is finite (recall, i\ is the transition probability for forward 
Euler). 

Denoting by Y(h) the true solution after time h, we write 

E a; (C/ £ (Xi)) = E^^Y^+E^^Xi) - U\Y{h))) . 
Remark 1231 then implies that there are positive constants 7^ and Kg such that 

^(U^Xx)) < e-^ h U\x) + —(1 - e~^ h ) + \E x {U e (X{) - U l {Y{h)))\ , 

and the approximation result between forward Euler and the true solution given in 
Lemma I5T81 below implies the desired result. □ 

An immediate corollary of this bound is given by 
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Corollary 5.7. For every I > 1 there exist positive constants ae and Ki such that 
the bound 

E a, (C/(Xi) A athT 1 / 2 ) 1 < (U(x) A c^/r 1/2 / + hK e , 
holds for every x S W 1 . 

Proof. It follows from Lemma lBTBl that. provided that ai is small enough, one has 

E x (U i (X 1 ) A a e h~ 1/2 ) < E x (U e (X 1 )) < U £ (x) + hK £ , 

for all x such that U (x) < anhT 1 ' 2 . On the other hand, one has the obvious bound 

E^CXi) A a t h~ l l 2 ) 1 < {azh- 1 ' 2 ) 1 , 

which is valid for all x. Collecting both bounds concludes the proof. □ 

Lemma 5.8. Let X \ and Y{h) denote forward Euler and the true solution after 
one step, respectively. For every I E N, there exists a constant Ci > such that 
the bound 

E x - U\Y{h))^ < C e h 2 U e+2 (x) 

holds for all x G W n satisfying U(x) < h~ 1 / 2 and for all h < 1. 

Proof. Observe that a single step of forward Euler is equivalent in law to the fol- 
lowing Langevin diffusion with constant drift: 



X, 



X(h) 



where the process X satisfies 



dX = -VU(x)dt + y / 2(3- 1 dW, X(0) = x . 
The infinitesimal generator of this process is given by: 

(C h9 )(y) = -VU(x) T Vg(y) + /T^y) • 
Since ChU e (x) = CU e (x), an exact Ito-Taylor expansion yields, 
E x (u e (X 1 )-U e (Y(h)) 



(5.15) 



E a 4 

' to Jo 

The triangle inequality implies 

E x {jJ l {X\) — U e (Y(h)) 

rh 



C 2 h U\X{r)) - C 2 U\Y{r))]drds\ 



< 



>0 Jo 



E x [C 2 h U\X{r)) 



dr ds + 



o Jo 



E x (c 2 U e (Y(r))^ 



(5.16) 

dr ds . 
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Assumption 12. 1 1 (D) implies that there exists a positive constant C such that 

(£ 2 U e )(y) < CU e+2 (y) , {C 2 h U"){y) < CU 2 {x)U\y) , 

for all x, y G W l . These inequalities bound the integrands in (15.161) . For the second 
term, we have 



E x [C 2 U\Y(r)) 



< CE x (u £+2 (Y(r))) < CU e+2 (x) , (5.17) 



where Remark I2l6l is used in the last step. 
To bound the first term, note that 



E x [C 2 h U e (X(r)) 



<CU\x)E x [U\X{r))J (5.18) 
where r G [0,h]. The definition of an Euler step yields 

E x (u e (X(r))) = (27r)"" /2 J U e (x - rVU(x) + ^2{3~ 1 rO exp (~^)d^ ■ 

Since, by hypothesis, 

r\VU(x)\ < h\VU(x)\ < Ch\U(x)\ < CVh , 

it follows from Lemma IBTTl that 

U e (x - rVU(x) + v^-VO < exp (lVh(C + y/2P~ x \S\)\ U\x) 

for all r G [0,h\. Therefore, 

E x {U e (X(r))) < CU\x) , (5.19) 

for some C > independent of h. Combining (15.19I ). (15.181 ) and (15.171 ) and insert- 
ing these bounds into (15.161 ) yields the required bound. □ 

6 Conclusion 

In this paper we showed that MALA's lack of a spectral gap is not severe. In 
particular, our main result, Theorem l3.ll states its convergence to equilibrium hap- 
pens at exponential rate up to terms exponentially small in time-stepsize. This 
quantification relies on MALA exactly preserving the SDE's invariant measure 
and accurately representing the SDE's transition probability on finite time inter- 
vals. The first property is automatic since the target distribution in the Metropolis- 
Hastings step is the SDE's equilibrium distribution. Deriving the second property 
requires a generalization of finite-time estimates for MALA MB V 101 and forward 
Euler EBT95 . MSH02I1 . This derivation involves obtaining new results on the accu- 
racy of MALA and forward Euler with respect to the true solution of the SDE in 
the context where the drift is not be globally Lipschitz. 
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A key technical issue addressed in the proof of Theorem 13.11 is that MALA 
locally inherits a Lyapunov function of the true solution &(x) = exp(9U(x)). Since 
U grows faster than a quadratic function, the function is not integrable with 
respect to a Gaussian measure including the transition probability of forward Euler. 
Nevertheless, we prove integrability of & with respect to the transition probability 
of MALA as a consequence of MALA preserving an equilibrium measure whose 
tails decrease faster than <I> increases. 

Finite-time accuracy implied MALA inherits a minorization and local drift con- 
dition from the SDE. As a consequence the paper proved that its mixing time is 
nearby the mixing time of the SDE on compact sets. The patching argument in 
Theorem l3. 1 I compares MALA to a version of MALA with reflection on the bound- 
ary of these compact sets to boost this local property to a global property plus terms 
exponentially small in time-stepsize. 

Finally, we note that the proof of Lemma I3T21 motivates the following question: 
is forward Euler a strongly or weakly convergent method on finite time intervals? 
The answer is no because a necessary condition for a numerical method to converge 
on finite time intervals is stability which we have shown forward Euler lacks for 
nonglobally Lipschitz drifts. However, the lemma does motivate using forward 
Euler as a proposal chain in the Metropolis-Hastings algorithm to sample from the 
equilibrium measure of the SDE. 
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